Overview

Dataset statistics

Number of variables12
Number of observations15932992
Missing cells43472692
Missing cells (%)22.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory9.3 GiB
Average record size in memory628.6 B

Variable types

CAT10
NUM2

Reproduction

Analysis started2020-02-24 20:19:54.075962
Analysis finished2020-02-24 21:37:01.836531
Versionpandas-profiling v2.5.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
user_id has a high cardinality: 730803 distinct values High cardinality
session_id has a high cardinality: 910683 distinct values High cardinality
reference has a high cardinality: 400277 distinct values High cardinality
platform has a high cardinality: 55 distinct values High cardinality
city has a high cardinality: 34752 distinct values High cardinality
current_filters has a high cardinality: 61980 distinct values High cardinality
impressions has a high cardinality: 1059891 distinct values High cardinality
prices has a high cardinality: 1066775 distinct values High cardinality
current_filters has 14779880 (92.8%) missing values Missing
impressions has 14346406 (90.0%) missing values Missing
prices has 14346406 (90.0%) missing values Missing

Variables

user_id
Categorical

HIGH CARDINALITY
Distinct count730803
Unique (%)4.6%
Missing0
Missing (%)0.0%
Memory size121.6 MiB
6JWWFFNUMY6Y
 
6230
0H73OEP6Z71O
 
4084
7K4V4W05S7X7
 
4077
SX3I42SKZEVH
 
3876
Q46K4RJHTQFR
 
3810
Other values (730798)
15910915
ValueCountFrequency (%) 
6JWWFFNUMY6Y 6230 < 0.1%
 
0H73OEP6Z71O 4084 < 0.1%
 
7K4V4W05S7X7 4077 < 0.1%
 
SX3I42SKZEVH 3876 < 0.1%
 
Q46K4RJHTQFR 3810 < 0.1%
 
G7U04A2HQFSG 3607 < 0.1%
 
M8E88OK4G3IE 3416 < 0.1%
 
A5ZFRVCM2Z1L 3358 < 0.1%
 
EQKV6819ZD7M 3320 < 0.1%
 
CQM1034RBOZI 3141 < 0.1%
 
Other values (730793) 15894073 99.8%
 

Length

Max length12
Mean length12
Min length12
ValueCountFrequency (%) 
Uppercase_Letter 26 72.2%
 
Decimal_Number 10 27.8%
 
ValueCountFrequency (%) 
Latin 26 72.2%
 
Common 10 27.8%
 
ValueCountFrequency (%) 
ASCII 36 100.0%
 

session_id
Categorical

HIGH CARDINALITY
Distinct count910683
Unique (%)5.7%
Missing0
Missing (%)0.0%
Memory size121.6 MiB
3167404ed3197
 
3522
948641e533837
 
2816
9233fb83c116b
 
2800
191ae48e3cb8e
 
2648
c9b863c921a2d
 
2640
Other values (910678)
15918566
ValueCountFrequency (%) 
3167404ed3197 3522 < 0.1%
 
948641e533837 2816 < 0.1%
 
9233fb83c116b 2800 < 0.1%
 
191ae48e3cb8e 2648 < 0.1%
 
c9b863c921a2d 2640 < 0.1%
 
c4dc91b78ded1 2518 < 0.1%
 
4c8e1e29b93fc 2340 < 0.1%
 
b34847506ba7f 2310 < 0.1%
 
58a263c18b945 2219 < 0.1%
 
e9a8f4e36ea10 2216 < 0.1%
 
Other values (910673) 15906963 99.8%
 

Length

Max length13
Mean length13
Min length13
ValueCountFrequency (%) 
Lowercase_Letter 22 68.8%
 
Decimal_Number 10 31.2%
 
ValueCountFrequency (%) 
Latin 22 68.8%
 
Common 10 31.2%
 
ValueCountFrequency (%) 
ASCII 32 100.0%
 

timestamp
Real number (ℝ≥0)

Distinct count518048
Unique (%)3.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1541304041.0163977
Minimum1541030408
Maximum1541548799
Zeros0
Zeros (%)0.0%
Memory size121.6 MiB

Quantile statistics

Minimum1541030408
5-th percentile1541068577
Q11541173676
median1541319766
Q31541436748
95-th percentile1541529510
Maximum1541548799
Range518391
Interquartile range (IQR)263072

Descriptive statistics

Standard deviation150309.1017
Coefficient of variation (CV)9.752073421e-05
Kurtosis-1.220562807
Mean1541304041
Median Absolute Deviation (MAD)131043.0426
Skewness-0.1072420208
Sum2.455758496e+16
Variance2.259282606e+10
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1.54103041e+09 1.54103045e+09 1.54103050e+09 1.54103057e+09 1.54103057e+09 ... 1.54154870e+09 1.54154870e+09 1.54154873e+09 1.54154873e+09 1.54154880e+09], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
1541443523 172 < 0.1%
 
1541449443 146 < 0.1%
 
1541362622 144 < 0.1%
 
1541536100 141 < 0.1%
 
1541440488 139 < 0.1%
 
1541449515 136 < 0.1%
 
1541364764 135 < 0.1%
 
1541275821 134 < 0.1%
 
1541546026 133 < 0.1%
 
1541444738 133 < 0.1%
 
Other values (518038) 15931579 > 99.9%
 
ValueCountFrequency (%) 
1541030408 1 < 0.1%
 
1541030410 1 < 0.1%
 
1541030412 1 < 0.1%
 
1541030414 1 < 0.1%
 
1541030423 3 < 0.1%
 
ValueCountFrequency (%) 
1541548799 7 < 0.1%
 
1541548798 11 < 0.1%
 
1541548797 6 < 0.1%
 
1541548796 8 < 0.1%
 
1541548795 17 < 0.1%
 

step
Real number (ℝ≥0)

Distinct count3522
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean75.58612186587428
Minimum1
Maximum3522
Zeros0
Zeros (%)0.0%
Memory size121.6 MiB

Quantile statistics

Minimum1
5-th percentile1
Q18
median28
Q381
95-th percentile304
Maximum3522
Range3521
Interquartile range (IQR)73

Descriptive statistics

Standard deviation144.5524398
Coefficient of variation (CV)1.912420378
Kurtosis58.68380695
Mean75.58612187
Median Absolute Deviation (MAD)78.98644605
Skewness5.879137135
Sum1204313075
Variance20895.40785
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1.0000e+00 1.5000e+00 2.5000e+00 3.5000e+00 4.5000e+00 ... 2.3405e+03 2.5185e+03 2.6485e+03 2.8165e+03 3.5220e+03], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
1 910732 5.7%
 
2 712452 4.5%
 
3 584269 3.7%
 
4 490674 3.1%
 
5 426992 2.7%
 
6 377810 2.4%
 
7 342413 2.1%
 
8 314490 2.0%
 
9 292139 1.8%
 
10 274139 1.7%
 
Other values (3512) 11206882 70.3%
 
ValueCountFrequency (%) 
1 910732 5.7%
 
2 712452 4.5%
 
3 584269 3.7%
 
4 490674 3.1%
 
5 426992 2.7%
 
ValueCountFrequency (%) 
3522 1 < 0.1%
 
3521 1 < 0.1%
 
3520 1 < 0.1%
 
3519 1 < 0.1%
 
3518 1 < 0.1%
 

action_type
Categorical

Distinct count10
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size121.6 MiB
interaction item image
11860750
clickout item
 
1586586
filter selection
 
695917
search for destination
 
403066
change of sort order
 
400584
Other values (5)
 
986089
ValueCountFrequency (%) 
interaction item image 11860750 74.4%
 
clickout item 1586586 10.0%
 
filter selection 695917 4.4%
 
search for destination 403066 2.5%
 
change of sort order 400584 2.5%
 
interaction item info 285402 1.8%
 
interaction item rating 217246 1.4%
 
interaction item deals 193794 1.2%
 
search for item 152203 1.0%
 
search for poi 137444 0.9%
 

Length

Max length23
Mean length20.65128452
Min length13
ValueCountFrequency (%) 
Lowercase_Letter 18 94.7%
 
Space_Separator 1 5.3%
 
ValueCountFrequency (%) 
Latin 18 94.7%
 
Common 1 5.3%
 
ValueCountFrequency (%) 
ASCII 19 100.0%
 

reference
Categorical

HIGH CARDINALITY
Distinct count400277
Unique (%)2.5%
Missing0
Missing (%)0.0%
Memory size121.6 MiB
interaction sort button
 
235027
Sort by Price
 
78922
price only
 
78863
Hotel
 
58039
5 Star
 
46193
Other values (400272)
15435948
ValueCountFrequency (%) 
interaction sort button 235027 1.5%
 
Sort by Price 78922 0.5%
 
price only 78863 0.5%
 
Hotel 58039 0.4%
 
5 Star 46193 0.3%
 
Best Value 43319 0.3%
 
price and recommended 43317 0.3%
 
4 Star 42625 0.3%
 
Resort 42343 0.3%
 
Hostal (ES) 35028 0.2%
 
Other values (400267) 15229316 95.6%
 

Length

Max length150
Mean length7.262445999
Min length2
ValueCountFrequency (%) 
Lowercase_Letter 121 54.5%
 
Uppercase_Letter 67 30.2%
 
Decimal_Number 10 4.5%
 
Other_Letter 7 3.2%
 
Other_Punctuation 6 2.7%
 
Space_Separator 2 0.9%
 
Dash_Punctuation 2 0.9%
 
Close_Punctuation 1 0.5%
 
Initial_Punctuation 1 0.5%
 
Final_Punctuation 1 0.5%
 
Other values (4) 4 1.8%
 
ValueCountFrequency (%) 
Latin 157 70.7%
 
Common 27 12.2%
 
Cyrillic 23 10.4%
 
Greek 10 4.5%
 
Han 5 2.3%
 
ValueCountFrequency (%) 
ASCII 74 64.9%
 
Cyrillic 23 20.2%
 
Latin Ext Additional 8 7.0%
 
CJK 5 4.4%
 
Punctuation 3 2.6%
 
IPA Ext 1 0.9%
 

platform
Categorical

HIGH CARDINALITY
Distinct count55
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size121.6 MiB
BR
2634304
US
 
1627520
DE
 
1001105
UK
 
918900
MX
 
833785
Other values (50)
8917378
ValueCountFrequency (%) 
BR 2634304 16.5%
 
US 1627520 10.2%
 
DE 1001105 6.3%
 
UK 918900 5.8%
 
MX 833785 5.2%
 
IN 679747 4.3%
 
AU 595003 3.7%
 
TR 564271 3.5%
 
JP 547480 3.4%
 
IT 527046 3.3%
 
Other values (45) 6003831 37.7%
 

Length

Max length2
Mean length2
Min length2
ValueCountFrequency (%) 
Uppercase_Letter 25 100.0%
 
ValueCountFrequency (%) 
Latin 25 100.0%
 
ValueCountFrequency (%) 
ASCII 25 100.0%
 

city
Categorical

HIGH CARDINALITY
Distinct count34752
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size121.6 MiB
London, United Kingdom
 
326255
Paris, France
 
262060
Istanbul, Turkey
 
230458
New York, USA
 
223320
Rio de Janeiro, Brazil
 
161973
Other values (34747)
14728926
ValueCountFrequency (%) 
London, United Kingdom 326255 2.0%
 
Paris, France 262060 1.6%
 
Istanbul, Turkey 230458 1.4%
 
New York, USA 223320 1.4%
 
Rio de Janeiro, Brazil 161973 1.0%
 
Amsterdam, Netherlands 150529 0.9%
 
Rome, Italy 146798 0.9%
 
Cancun, Mexico 146004 0.9%
 
Tokyo, Japan 141557 0.9%
 
Berlin, Germany 134252 0.8%
 
Other values (34742) 14009786 87.9%
 

Length

Max length55
Mean length17.62062355
Min length8
ValueCountFrequency (%) 
Lowercase_Letter 97 56.4%
 
Uppercase_Letter 58 33.7%
 
Other_Punctuation 5 2.9%
 
Decimal_Number 5 2.9%
 
Space_Separator 2 1.2%
 
Dash_Punctuation 2 1.2%
 
Final_Punctuation 1 0.6%
 
Modifier_Symbol 1 0.6%
 
Modifier_Letter 1 0.6%
 
ValueCountFrequency (%) 
Latin 154 89.5%
 
Common 17 9.9%
 
Greek 1 0.6%
 
ValueCountFrequency (%) 
ASCII 64 87.7%
 
Latin Ext Additional 6 8.2%
 
Punctuation 2 2.7%
 
Modifier Letters 1 1.4%
 

device
Categorical

Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size121.6 MiB
mobile
7643538
desktop
7003938
tablet
 
1285516
ValueCountFrequency (%) 
mobile 7643538 48.0%
 
desktop 7003938 44.0%
 
tablet 1285516 8.1%
 

Length

Max length7
Mean length6.439587116
Min length6
ValueCountFrequency (%) 
Lowercase_Letter 12 100.0%
 
ValueCountFrequency (%) 
Latin 12 100.0%
 
ValueCountFrequency (%) 
ASCII 12 100.0%
 

current_filters
Categorical

HIGH CARDINALITY
MISSING
Distinct count61980
Unique (%)5.4%
Missing14779880
Missing (%)92.8%
Memory size121.6 MiB
Sort by Price
159376
Focus on Distance
 
80143
Best Value
 
74923
5 Star|Hotel|Motel|Resort|Hostal (ES)
 
26106
Sort By Distance
 
21754
Other values (61975)
790810
ValueCountFrequency (%) 
Sort by Price 159376 1.0%
 
Focus on Distance 80143 0.5%
 
Best Value 74923 0.5%
 
5 Star|Hotel|Motel|Resort|Hostal (ES) 26106 0.2%
 
Sort By Distance 21754 0.1%
 
5 Star|4 Star|Hotel|Motel|Resort|Hostal (ES) 21718 0.1%
 
5 Star|4 Star|3 Star|Hotel|Motel|Resort|Hostal (ES) 17854 0.1%
 
Excellent Rating 17506 0.1%
 
Very Good Rating 16384 0.1%
 
Focus on Rating 14673 0.1%
 
Other values (61970) 702675 4.4%
 
(Missing) 14779880 92.8%
 

Length

Max length259
Mean length5.044709807
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 25 39.7%
 
Uppercase_Letter 23 36.5%
 
Decimal_Number 6 9.5%
 
Other_Punctuation 3 4.8%
 
Math_Symbol 2 3.2%
 
Space_Separator 1 1.6%
 
Dash_Punctuation 1 1.6%
 
Close_Punctuation 1 1.6%
 
Open_Punctuation 1 1.6%
 
ValueCountFrequency (%) 
Latin 48 76.2%
 
Common 15 23.8%
 
ValueCountFrequency (%) 
ASCII 63 100.0%
 

impressions
Categorical

HIGH CARDINALITY
MISSING
UNIFORM
Distinct count1059891
Unique (%)66.8%
Missing14346406
Missing (%)90.0%
Memory size121.6 MiB
1668573
 
59
2262316
 
53
2717657
 
53
128010|275217|31740|265842|266112|120896|4329278|5737070|265822|1289285|1501497|1568459|4733508|32102|5705516|3922410|3375350|266047|4523644|266137|120381|120915|270587|120385|263887
 
48
2343986|8288974
 
48
Other values (1059886)
1586325
ValueCountFrequency (%) 
1668573 59 < 0.1%
 
2262316 53 < 0.1%
 
2717657 53 < 0.1%
 
128010|275217|31740|265842|266112|120896|4329278|5737070|265822|1289285|1501497|1568459|4733508|32102|5705516|3922410|3375350|266047|4523644|266137|120381|120915|270587|120385|263887 48 < 0.1%
 
2343986|8288974 48 < 0.1%
 
20897|20669|20677|20758|20766|1220350|20736|20674|20789|20737|2669764|6286688|20662|20752|1217792|20745|7127542|20680|84987|20712|20832|945823|20837|81466|20725 42 < 0.1%
 
2552508|2628141|320791|521526|383116|521261|2212958|3135658|5963374|1979851|1837395|6819172|3201066|521546|4143828|5922370|438096|3830270|1131877|8866872|1983739|1837423|9344330|2103318|2617715 41 < 0.1%
 
9377362|2667270|2663095|1158907|1535635|2788752|10511626|1018283|4989528|10601506|3196195|10632766|2702696|1403190|2704220|4488722|4846662|10644372|7191226|10061308|4712380|1018260|4525298|3583390|4777248 38 < 0.1%
 
128010|275217|31740|265842|266112|120896|5737070|4329278|265822|1289285|1501497|1568459|4733508|3922410|32102|5705516|3375350|266047|4523644|266137|120381|120915|270587|120385|263887 36 < 0.1%
 
4182500|4431788|6071334 35 < 0.1%
 
Other values (1059881) 1586133 10.0%
 
(Missing) 14346406 90.0%
 

Length

Max length224
Mean length19.54954022
Min length3
ValueCountFrequency (%) 
Decimal_Number 10 76.9%
 
Lowercase_Letter 2 15.4%
 
Math_Symbol 1 7.7%
 
ValueCountFrequency (%) 
Common 11 84.6%
 
Latin 2 15.4%
 
ValueCountFrequency (%) 
ASCII 13 100.0%
 

prices
Categorical

HIGH CARDINALITY
MISSING
UNIFORM
Distinct count1066775
Unique (%)67.2%
Missing14346406
Missing (%)90.0%
Memory size121.6 MiB
26
 
72
27
 
68
45
 
68
18
 
65
30
 
59
Other values (1066770)
1586254
ValueCountFrequency (%) 
26 72 < 0.1%
 
27 68 < 0.1%
 
45 68 < 0.1%
 
18 65 < 0.1%
 
30 59 < 0.1%
 
32 51 < 0.1%
 
34 51 < 0.1%
 
57 51 < 0.1%
 
28 50 < 0.1%
 
140 49 < 0.1%
 
Other values (1066765) 1586002 10.0%
 
(Missing) 14346406 90.0%
 

Length

Max length124
Mean length10.35702077
Min length1
ValueCountFrequency (%) 
Decimal_Number 10 76.9%
 
Lowercase_Letter 2 15.4%
 
Math_Symbol 1 7.7%
 
ValueCountFrequency (%) 
Common 11 84.6%
 
Latin 2 15.4%
 
ValueCountFrequency (%) 
ASCII 13 100.0%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

user_idsession_idtimestampstepaction_typereferenceplatformcitydevicecurrent_filtersimpressionsprices
000RL8Z82B2Z1aff3928535f4815410374601search for poiNewtownAUSydney, AustraliamobileNaNNaNNaN
100RL8Z82B2Z1aff3928535f4815410375222interaction item image666856AUSydney, AustraliamobileNaNNaNNaN
200RL8Z82B2Z1aff3928535f4815410375223interaction item image666856AUSydney, AustraliamobileNaNNaNNaN
300RL8Z82B2Z1aff3928535f4815410375324interaction item image666856AUSydney, AustraliamobileNaNNaNNaN
400RL8Z82B2Z1aff3928535f4815410375325interaction item image109038AUSydney, AustraliamobileNaNNaNNaN
500RL8Z82B2Z1aff3928535f4815410375326interaction item image666856AUSydney, AustraliamobileNaNNaNNaN
600RL8Z82B2Z1aff3928535f4815410375327interaction item image109038AUSydney, AustraliamobileNaNNaNNaN
700RL8Z82B2Z1aff3928535f4815410375328interaction item image666856AUSydney, AustraliamobileNaNNaNNaN
800RL8Z82B2Z1aff3928535f4815410375429interaction item image109038AUSydney, AustraliamobileNaNNaNNaN
900RL8Z82B2Z1aff3928535f48154103754210interaction item image109038AUSydney, AustraliamobileNaNNaNNaN

Last rows

user_idsession_idtimestampstepaction_typereferenceplatformcitydevicecurrent_filtersimpressionsprices
15932982ZYNMLE3MV3LK62728015bec05154154448010interaction item image6617798PTParis, FrancedesktopNaNNaNNaN
15932983ZYNMLE3MV3LK62728015bec05154154448011interaction item image6617798PTParis, FrancedesktopNaNNaNNaN
15932984ZYNMLE3MV3LK62728015bec05154154448012interaction item image6617798PTParis, FrancedesktopNaNNaNNaN
15932985ZYNMLE3MV3LK62728015bec05154154448013interaction item image6617798PTParis, FrancedesktopNaNNaNNaN
15932986ZYNMLE3MV3LK62728015bec05154154448014interaction item image6617798PTParis, FrancedesktopNaNNaNNaN
15932987ZYNMLE3MV3LK62728015bec05154154449015interaction item image6617798PTParis, FrancedesktopNaNNaNNaN
15932988ZYNMLE3MV3LK62728015bec05154154449116clickout item6617798PTParis, FrancedesktopFocus on Distance6617798|1263420|9567886|1161323|149768|1890735|48766|49244|18208|129443|6002460|3213646|48511|49976|50117|3503750|153375|49847|4342488|12260|2712342|48497|11933|1714483|123668758|96|55|75|90|60|233|104|150|145|328|207|150|181|135|99|495|170|118|259|73|169|87|485|171
15932989ZYNMLE3MV3LK62728015bec05154154454017clickout item2712342PTParis, FrancedesktopFocus on Distance6617798|1263420|9567886|1161323|149768|1890735|48766|49244|18208|129443|6002460|3213646|48511|49976|50117|3503750|153375|49847|4342488|12260|2712342|48497|11933|1714483|123668758|96|55|75|90|60|233|104|150|145|328|207|150|181|135|99|495|170|118|259|73|169|87|485|171
15932990ZYNMLE3MV3LK62728015bec05154154496718change of sort orderinteraction sort buttonPTParis, FrancedesktopNaNNaNNaN
15932991ZYNMLE3MV3LK62728015bec05154154497319clickout item1161323PTParis, FrancedesktopFocus on Distance6617798|1263420|9567886|1161323|149768|1890735|48766|49244|18208|129443|6002460|3213646|48511|49976|50117|3503750|153375|49847|4342488|12260|2712342|48497|11933|1714483|123668758|96|55|75|90|60|233|104|150|145|328|207|150|181|135|99|495|170|118|259|73|169|87|485|171